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Abstract: Motivated by problems in insurance, our task is to predict 
finite upper bounds on a future draw from an unknown distribution p 
over tlie set of natural numbers, using only past observations generated 
independently and identically distributed according to p. While p is 
unknown, it is known to belong to a given collection V of probability 
distributions on the natural numbers. The support of the distributions 
p (zV may be unbounded, and the prediction occurs for infinitely many 
draws. We are allowed to make observations without predicting upper 
bounds for some time, but must start and then continue to predict upper 
bounds after a finite time with probability 1 irrespective of which p £ V 
governs the observations. 

If it is possible, without knowledge of p and for any prescribed confi- 
dence however close to 1, to come up with a sequence of upper bounds 
that is never violated over an infinite time window with confidence at 
least as big as prescribed, we say the model class V is insurable. We char- 
acterize insurability of any class V of distributions over natural numbers 
by a condition on how the neighborhood of distributions p £ V should 
behave, one that is both necessary and sufficient. 

Keywords: insurance, £i topology, non-parametric approaches, pre- 
diction, universal compression. 



1. Introduction 



Insurance is a means of managing risk by transfering a potential sequence 
of losses to an insurer for a price paid at the beginning of each period, 
called the premium. The insurer attempts to break even by balancing the 
possible loss that may be suffered by a few with the guaranteed premiums 
of many. We aim to study the fundamentals of this problem when the losses 
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can be unbounded and a precise model for the probability distribution of 
the aggregate loss in each period either does not exist or is infeasible to get. 

A systematic, theoretical, as opposed to empirical, study of insurance 
goes back to 1903 when Filip Lundberg [T] defined a natural probabilistic 
setting as part of his thesis. In particular, Lundberg formulated a collective 
risk problem pooling together the risk of all the insured parties into a single 
entity, which we call the insured. Typically, studies of insurance derived 
from the approach of [I] depend on working with specific models for the 
loss distribution, e.g. compound Poisson models, after which questions of 
interest in practice, such as for instance the relation between the size of the 
premiums charged and the probability of the insurer going bankrupt, can be 
analyzed. A rather comprehensive theory of insurance along these lines has 
evolved [21, ^H] which incorporates several model classes for the distribution 
of the losses over time other than compound Poisson processes, and which 
also includes some heavy tailed distribution classes. 

We depart from the existing literature in two important respects. The first 
relates to the practice among insurers to limit payments to a predetermined 
ceiling, even if the loss suffered by the insured exceeds this ceiling. In both 
the insurance industry and the legal regulatory framework surrounding it, 
this is assumed to be common sense. But is it always necessary to impose 
such ceilings? Moreover, in scenarios such as reinsurance, a ceiling on com- 
pensation is not only undesirable, but may also limit the very utility of the 
business. 

The second unconventional aspect of our approach arises from our mo- 
tivation to deal with several new settings for which some sort of insurance 
is desirable, but where insurers are hesitant to enter the market. Examples 
of such settings include insuring against network outages or attacks against 
future smart grids, where the cascade effect of outages or attacks could be 
catastrophic. In these settings, it is not clear today what should constitute 
a reasonable risk model because of the absence of usable information about 
what might cause the outages or motivate the attacks. We address this issue 
by working with a class of models, i.e. a set of probability laws over loss 
sequences that adheres to any assumptions the insurer may want to make 
or any information it may already have. In this paper we will only consider 
loss models that are independent and identically distributed (i.i.d.) from 
period to period, so we can equivalently think of a model class as defined 
in terms of its one dimensional marginals. As an example, we may want 
to consider the set of all finite moment probability distributions over the 
nonnegative integers as our class of possible models for the loss distribution 
in each period. Now, we ask the question: what classes of models are the 
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ones on which the insurer can learn from observations and set premiums so 
as to remain solvent? In this paper, we completely answer this question by 
giving a necessary and sufficient condition that characterizes what classes of 
models lend themselves to this insurance task. 

Formally, we adopt the collective risk approach, namely, we abstract the 
problem to include just two agents, the insurer and the insured. Losses 
incurred by the insured are considered to form a discrete time sequence of 
random variables, with the sequence of losses denoted by {Xj, i > 1}, and wc 
assume that G N for alH > 1, where N denotes the set of natural numbers, 
{0, 1,2,...}. A model class V°° is a collection of measures on infinite length 
loss sequences, and is to be thought of as the set of all potential probability 
laws governing the loss sequence. Each element of 1^°° is a m,odel for the 
sequence of losses. Any prior knowledge on the structure of the problem is 
accounted for in the definition of V°° . We focus on measures corresponding to 
i.i.d. samples, i.e. each member of 7^°° is a product distribution. We denote 
by V the set of distributions on N obtained as one dimensional marginals of 
. Since there is no risk of confusion, we will also refer to the distributions 
in V as models and to V as the model class. 

The actual model in V governing the law of the loss in each period remains 
unknown to the insurer. We assume no ceiling on the loss, and require the 
insurer to compensate the insured in full for the loss in each period at the 
end of that period. The insurer is assumed to start with some initial capital 
Ho G M"*", a nonnegative real number. The insurer then sets a sequence of 
premiums based on the past losses — at time i, the insurer collects a premium 
n(X^~^) at the beginning of the period, and pays out to compensate for the 
loss Xi at the end of the period. If the built up capital till step i (including 
n(Xp^), and after having paid out all past losses) is less than X^, the 
insurer is said to be bankrupted. Given a class of loss models, we ask if 
for every prescribed upper bound 77 > on the probability of bankruptcy, 
the insurer can set (finite) premiums at every time step based only on the 
loss sequence observed thus far and with no further knowledge of which law 
p € governs the loss sequence, such that the insurer remains solvent 
with probability bigger than 1 — 7/ under p irrespective of which p € is 
in effect. If the probability of the insurer ever going bankrupt over an infinite 
time window can be made arbitrarily small in this sense, the class of i.i.d. 
loss measures is said to be insurable. 

A couple of clarifications are in order here. First, to make the problem non- 
trivial, we allow the insurer to observe the loss sequence for some arbitrary 
finite length of time without having to provide insurance. We require that 
the insurer has to eventually provide insurance with probability 1 no matter 
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which p € V°° is in effect, and cannot quit providing insurance once it has 
entered into the insurance contract with the insured. Premiums set before 
the entry time can be thought of as being and the question of bankruptcy 
only arises after the insurer has entered into the contract. Secondly, we do 
not concern ourselves with incentive compatibility issues on the part of the 
insured and assume that the insured will accept the contract once the insurer 
has entered, agreeing to pay the premiums as set by the insurer. 

It turns out that the fact that the capital available to the insurer at any 
time is built up from past premiums does not play any role in whether a 
model class is insurable or not. In fact, the problem is basically one of finding 
a sequence of finite upper bounds on the loss Xi for alH > 1. We 

refer to the sequence i > 1} as the loss dominating sequence 

and call the loss- dominant at step i. The notion of insurability of 

a model class V comes down to whether for each > there is a way of 
choosing the loss dominants in such a way that the probability of the loss 
Xi ever exceeding the loss dominant <I>(X|^^) is smaller than rf irrespective 
of which model p in the model class V°° is in effect. Here again we allow 
some initial finite number of periods for which the loss dominant can be set 
to oo, but it must become finite with probability 1 under each p € and 
stay finite from that point onwards. 

Theoretically, the flexibility we have permitted regarding when to start 
proposing flnite loss dominants places the insurance problem formulated 
above in a class of problems that can be said to admit useful pointwise con- 
vergent estimates. Roughly speaking, the insurance problem can be thought 
of as one of requiring estimating all the percentiles of an unknown distri- 
bution from 7^, using only i.i.d. draws generated from it. However, as the 
sample size increases, the estimate of any given percentile need not converge 
to the true value (according to some predeflned metric) uniformly over the 
entire class V. Even if the estimate converges only pointwise over the class, 
it is useful if, for any given finite sample size, we could also say whether the 
estimate is doing well or not relative to the true model even though we don't 
know what the true model is. This is the case for the notion of insurability of 
a model class that we have introduced above. More generally, when dealing 
with large alphabets or high dimensions, it is sometimes too restrictive to 
require estimates or algorithms to converge to the true values uniformly over 
the model class as the sample size increases to infinity. Instead, this notion 
of useful pointwise convergence allows us to consider broader model classes 
from a practical perspective. For other kinds of such pointwise estimation, 
particularly in relation to information theoretic quantities, see [1]. 

For a model class to be insurable, roughly speaking, close distributions 
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must have comparable percentiles. Distributions in the model class that, in 
every neighborhood, have some other distribution with arbitrarily different 
percentiles are said to be deceptive. In Section [21 we define what it means 
for distributions to be close, and what it means for distributions to have 
comparable percentiles. In Section [3l we provide several examples of insur- 
able and non-insurable model classes. Our main result is Theorem [H which 
states that that is insurable iff it has no deceptive distributions. We 
prove this theorem in Sections H] and [5j 

2. Precise formulation of the problem and statement of the main 
result 

We model the loss at each time by a random variable taking values in N = 
{0, 1, . . .}. Denote the sequence of losses by Xi,X2 ■ ■ ■ where Xi G N. Let 
N* be the set of all finite length sequences from N, including the empty 
sequence. We will write for the sequence xi, . . . Where it appears, 
denotes the empty sequence. A loss distribution is a probability distribution 
on N. Let V he a set of loss distributions. 7-"°° is the collection of i.i.d. 
measures over infinite sequences of symbols from N such that the set of one 
dimensional marginals over N they induce is V. 

We write for the set of nonnegative real numbers and use := for 
equality by definition. 

Consider an insurer with an initial capital XIq € M"^. An insurance scheme 
for V is comprised of a pair (r, 11). Here r : N* H- {0, 1} satisfies r(xi, . . . , Xn) = 
1 t{xi, . . . , Xn+i) = 1 for all and also p(sup„ t{X'^) = 1) = 1 for all 
p € 7^°°. T should be thought of as defining an entry time for the insurer 
with the property that once the insurer has entered it stays entered and 
that the insurer enters with probability 1 irrespective of which p G is in 
effect. Here we say the insurer enters after seeing the sequence G N* (pos- 
sibly the empty sequence) if t(x"') = 1. The other ingredient of an insurance 
scheme is the premium setting scheme H : N* — )• M"^, satisfying n(x") = if 
r(x") = 0, with n(x"') being interpreted as the premium demanded by the 
insurer from the insured after the loss sequence x" € N* is observed. 

Let !(•) denote the indicator function of its argument. The event that the 
insurer goes bankrupt is the event that 

n 

Ho + ^{U{X'-^) - X,)1{t{X'-'^) = 1) < for some n > 1 . 
1=1 

In words, this is the event that in some period n > 1 after the insurer has 
entered, the loss X^ incurred by the insured exceeds the built up capital of 
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the insurer, namely the sum of its initial capital and all the premiums it 
has collected after it has entered (including the currenly charged premium 
n(X"^-'^)) less all the losses paid out so far. 

Definition 1. A class of laws on loss sequences is called insurable by 
an insurer with initial capital Ho € if V > 0, there exists an insurance 
scheme (r, 11) such that \/ p £ V°°, 

p((r, n) goes bankrupt ) < rj . 

We should remark that despite the apparent role of the initial capital of 
the insurer in this definition, it plays no role from a mathematical point of 
view. To see this note first that if a model class is insurable by an insurer 
with capital Ho it is clearly insurable by all insurers with initial captial at 
least Ho, since such an insurer can use the same entry time and premium 
setting scheme as the insurer with initial capital Ho. On the other hand, an 
insurer with initial capital less than Ho can use the same entry time as an 
insurer with initial capital Hq and simply charge an additional premium at 
the time of entry which in effect builds up its initial capital to XIq, and then 
proceed with the same premium setting scheme as that used by the insurer 
with initial captial Ho. This feature is an artifact of the complete flexibility 
we give the insurer in setting premiums; for more on this see the concluding 
remarks in Section [6l 

As indicated in the introductory Section [TJ we will first show that whether 
a model class of loss distributions is insurable is equivalent to whether we 
can find suitable loss domination sequences for the sequence of losses. We 
next make this connection and the associated terminology precise. 

A loss domination scheme for is a mapping <I> : N* H- U {oo}, 
where for G N*, we interpret as an estimated upper bound on 

Xn+i- We call i > 1} the loss-domination sequence and <I>(X*^^) 

the loss-dominant at step i. We require for all x" € N* that 

^{Xi, ... ,Xn) < oo =^ ^{Xi, ... , Xn+l) < OO 

and also that for all p € , 

j9(inf < oo) = 1. 

n>l 

We think of ^(x") = cx) as saying that the scheme has not yet committed 
to proposing finite loss dominants after having seen the sequence x", while 
if <I>(x") < oo it has. Once the scheme commits to proposing finite loss 
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dominants it has to continue to propose finite loss dominants from that 
point onwards. Further, with probability 1 under every p G the scheme 
has to eventually start proposing finite loss dominants. Given our motivation 
from the insurance problem, we will say the loss domination scheme $ goes 
bankrupt if < X„ for some n > 1. 

The connection between the insurance problem and the problem of se- 
lecting loss dominants can now be made precise as follows. 

Observation 1. Let 7^°° be a model class and rj > 0. Let Ho G M"*". An 
insurer with initial capital Hq can find an insurance scheme (r, H) such that 
the probability of remaining solvent is bigger than l — r] irrespective of which 
p G is in effect if and only if there is a loss domination scheme $ such 
that the probability of it going bankrupt is less than rj irrespective of which 
p G is in effect. 

Proof Given an insurance scheme (r, H) consider the loss domination 
scheme $ that has := cx) iff r(a;"') = and 

n-l 

:= no + 5^(n(x^-i) - Xi)i{T{x'~') = 1) + n(x"-i) , 

i=l 

if r{X"-) = 1. Since r enters (become equal to 1) with probability 1 under 
each p G 7^°° and stays equal to 1 once it has become 1, $ becomes finite with 
probability 1 under each p G 1^°° and stays finite once it has become finite. 
Thus $ is indeed a loss domination scheme. It is straightforward to check 
that if the insurance scheme (r, H) stays solvent with probability bigger 
than l — r] irrespective of which p G V°° is in effect then the loss domination 
scheme # becomes bankrupt with probability less than 77 irrespective of 
which p G is in effect. 

Conversely, given a loss domination scheme $ define the insurance scheme 
(r,n) by setting t(.t") := iff = 00 (and T(a;") := 1 iff $(x") < 00) 

and defining n(x") := if <I>(a;") = 00 and n(a:"') := $(x") if <I>(a;") < 00. 
One sees that r as defined becomes 1 with probability 1 under each p G V°° 
and stays equal to 1 once it becomes 1. Further, the premiums set at each 
time are finite and equal to till the entry time. Thus (r, 11) as defined 
is indeed an insurance scheme. It is straightforward to check if $ becomes 
bankrupt with probability less than rj irrespective of which p G is in 
effect, then (r, 11) stays solvent with probability bigger than l—rj irrespective 
of which p G 7^°° is in effect. Hence the above observation. □ 

We may therefore conclude that a model class V°° is insurable iff for all 
rj > there is a loss domination scheme $ such that the probabilty of going 
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bankrupt under $ is less than r/ irrespective of which p € V°° is in effect. In 
the rest of the paper we wih therefore focus mainly on whether the model 
class is such that for every r/ > a loss domination sequence ^ exists 
with its probability of bankruptcy being less than r] irrespective of which 
model in the model class governs the sequence of losses. 

In Theorem [H we provide a condition on V that is both necessary and 
sufficient for insurability. 

2.1. Close distributions 

Insurability of V°° depends on the neighborhoods of the probability distri- 
butions among its one dimensional marginals V. The relevant "distance" 
between distributions in V that decides the neighborhoods is 

Here D{p\\q) denotes the relative entropy of p with respect to q, where p 
and q are probability distributions on N, defined by 

D{p\\q) :=E?'(y)l°g$T- 

The logarithm is assumed to be taken to base 2 (we use In for the logarithm 
to the natural base). 

2.2. Cumulative distribution function 

Since we would like to discuss percentiles, it is convenient to use a non- 
standard definition for the cumulative distribution function of a probability 
distribution on N. 

For our purposes, the cumulative distribution function of any probability 
distribution p on N is a function from R+U{oo} — > [0, 1], and will be denoted 
by Fp. We obtain Fp by first defining Fp on points in the support of p. We 
define Fp for all other nonnegative real numbers by linearly interpolating 
between the values in the support of p. Finally, -Fp(oo) := 1. 

Let Fp^ : [0, 1] M+ U {cxd} denote the inverse function of Fp. Then 
F~^{x) = for all < X < Fp{0). lip has infinite support then Fp^{l) = oo, 
else F~^{1) is the smallest natural number y such that Fp{y) = 1. 
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Two simple and useful observations can now be made. Consider a proba- 
bility distribution p with support .4 C N. For (5 > 0, let (T for tail) 

Tp,5 := {yeA:y>F-\l-5)}, 

and let {H for head) 

Hp^s-={yeA:y<2F-\l-5/2)}. 
It is easy to see that 

p{Tp^s) > S and p{Hp^s) > 1 - S. 

Suppose that for some 5 > we have F~^{1 — S) > and the loss-dominant 
at the beginning of period i > I happens to be set to F~^{1 — 6), then the 
probability under p of the loss in period i exceeding the loss-dominant is 
bigger than S. If the loss-dominant at the beginning of period i happens 
to be set to 2Fp^{l — (5/2), then the probability that the loss in period i 
exceeds the loss-dominant is less than S. We will use these observations in 
the proofs to follow. 

2.3. Necessary and sufficient conditions for insurability 

Existence of close distributions with very different quantiles is what kills 
insurability. A loss domination scheme could be "deceived" by some process 
p G 7^°° into setting low loss-dominants, while a close enough distribution 
hits the scheme with too high a loss. The conditions for insurability of V°° 
are phrased in terms of the set of its one dimensional marginals, V. 

Formally, a probability distribution p in P is called deceptive if V e > 0, 
3 (5 > such that that no matter what f{5) G M+ is chosen, 3 a (bad) 
distribution q eV such that 

J{p, q) < e 

and 

F-\1-S)>f{6). 

In the above definition, f{6) is simply an arbitrary nonnegative real number. 
However, it is useful to think of this number as the evaluation of a function 
/ : (0, 1) — > M at S. Equivalcntly, a distribution p in 7^ is not deceptive if 3 
Sp > 0, such that V (5 > 0, 3 /((5) G M, such that all distributions q &V with 

J{p, q) < e 
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satisfy 

F-\l-6)<f{5). 
Our main theorem is the fohowing, which we prove in Sections [H and O 
Theorem 1. is insurable, iff no p G "P is deceptive. □ 

3. Examples 

Consider lA, the collection of all uniform distributions over a finite contiguous 
support of the form {m, . . . ,M}, with m < M being arbitrary nonnegative 
integers. Let the losses come as i.i.d. samples from one of the distributions 
in U — call the resulting model class U°°. 

Example 1. U°° is insurable. 

Proof If the threshold probability of ruin is t], choose the loss-domination 
scheme ^ as follows. For all sequences with n < log ^ + 1 set ^{x"') = oo. 
For all sequences with n > log ^ + 1, the loss-dominant is set to 

be twice the largest loss observed thus far. It is easy to see that this scheme 
is bankrupted with probability less than r] irrespective of which p G hl°° is 
in effect. □ 

Consider the set J\f°° of all i.i.d. processes such that the one dimensional 
marginals have finite moment. Namely, \/p G Af°°, ^pXi < oo. 

Example 2. J\f^ is not insurable. 

Proof Note that the loss process that puts probability 1 on the all zero 
sequence exists in J\f°°, since it corresponds to the one dimensional marginal 
loss distribution that produces loss in each period. Since every loss domi- 
nation scheme enters with probability 1 no matter which p G AA°° is in force, 
every loss domination scheme must enter after seeing some finite number of 
zeros. Fix any loss domination scheme Suppose the scheme starts to set 
finite dominants after seeing losses of size 0. To show that A/'°° is not 
insurable, we show that 3r] > and 3p G M°° such that 

p{ <I> goes bankrupt ) > r]. 
Fix 6 = 1 — 7]. Let e be small enough that 

(1 - e)^ > 1 - S/2, 
and let M be a number large enough that 

(1 - e)*^ < 5/2. 
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Note that since 1 — 5/2 > 6/2, we have N < M. Let L be greater than any 
of loss-dominants set by $ for the sequences 0'^, 0^"*"^, . . . 0*^. Let p € A/'°° 
satisfy, for all i, 



For the i.i.d. loss process having the law p, the insurer is bankrupted on all 
sequences that contain loss L in between the A'^-th and M-th steps. These 
sequences, O^L, 0^~^^L, . . . ,0^~^L, have respective probabilities (under p) 



and they also form a prefix free set. Therefore, summing up the geometric 
series and using the assumptions on e above, 

p{ ^ is bankrupted ) > (1 - e)^ - (1 - e)^ > 1-5/2- 5/2 = r/. □ 

One can actually directly verify that every distribution in J\f°° is decep- 
tive. 

Consider the collection of all i.i.d. loss distributions with monotone one 
dimensional marginals. A monotone probability distribution p on N is one 
that satisfies p{y + 1) < p{y) for all y € N. Let 7W°° be the set of all 
i.i.d. loss processes, with one dimensional marginal distribution from A4, 
the collection of all monotone probability distributions over N. 

Again, it is easily shown that every distribution in A4 is deceptive. It 
follows from Theorem [T] that 

Example 3. is not insurable. □ 

4. Necessary condition for insurability 

In this section we prove one direction of Theorem [H as stated next. 

Theorem 2. If "P"" is insurable, then no p € "P is deceptive. 
Proof To keep notation simple, we will denote by p (or q) both a measure 
in as well as the corresponding one dimensional marginal distribution, 
which is a member of V. The context will clarify which of the two is meant. 
We prove the contrapositive of the theorem: if some p € V is deceptive, then 
is not insurable. 

Pick < a < /i~^(^) where h{x) is the binary entropy function defined 
for < X < 1 by 




(1-6)^6,(1-6) 



N+1 



(1-6) 



M-1 



, . . . 



h{x) : 



X log X — (1 — x) log(l — x) 
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where the logarithm is to base 2. Fix < ry < (1 — 2/i(q))(1 — i), the 
bounds chosen in order to satisfy the technical requirements of the proof of 
Lemmadl Suppose p € P is deceptive. We prove that V°° is not insurable by 
finding for each loss domination scheme a probability distribution q 
such that 

q{ ^ goes bankrupt ) > rj. 

So, let $ be any loss domination scheme. Recall that $ enters on p with 
probability 1, in the sense that the loss dominants set by $ will eventually 
become finite with probability 1 under p. For all n > 1, let 

Rn := {x"" : < oo} 

be the set of sequences of length n on which $ has entered and let > 1 
be a number such that 

p{Rn) > 1 - a/2. (1) 

For any sequence x", let A(x") be the set of symbols that appear in it. Recall 
that the head of the distribution p, Hp^^, was defined in Section 12.21 to be 
the set {y € ^ : y < 2F~^{1 — 7/2)}, where A is the support of p. Further, 
define for all 7 > 

Rp,f,n '■— {x S Rn '■ A[x ) C Hp^j)^. 

Set e = Since p is deceptive, there exists 6 > such that for all 
f{5) € M, there exists a distribution q & V satisfying both 

J{p,q) < e = ^ and F-\l - 5) > f{5). (2) 

While the number f(6) can be arbitrary above, we focus on a specific number 
dependent only on <I>. To define this number, first pick k > 2 large enough 
that 

(1 _5fc)^+i/<5 > 1 _a/2. (3) 

Note that the limit of the left side above as fc — >■ cx) is 1, so there is always 
some choice k that works. Now, for all < (5' < 1, let 

f{6') := max ^{x'). 

N<i<N+\jr] 

A word about this parameter k, since it is not immediately apparent why 
this should be defined. We will effectively ignore the 5^ tail of the distribution 
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p, and focus only on strings in Rps^^i, N < i < N + -g. The advantage of 
doing so is technical — we will be able to handle p and q as though they 
were distributions with finite span. This is crucial, since we want to have 
the maximum over a finite set to ensure f{5') < oo. Furthermore, note that 
for iV < i < + i, p{Rp^sM) > 1 - a from © and ([3]). 

Let q G V satisfy ([2]) with f{6) as defined above. Applying Lemma H] to 
distributions over length-n sequences induced by the measures p,q & V°° 
corresponding to the distributions above, 

2 

^(-Rp,<5^Jv) > 1 - ^ - 2/i(a), 

namely, $ has entered with probability (under q) at least 1 — — 2h{a) for 
length N sequences. Since the insurer cannot quit once it has entered, the 
scheme has entered with probability (under q) at least 1 — — 2h{a) for all 
n length sequences where n > N. Namely for all n > N, 

q{Rp^S^n) ^ 1 - f - 

For convenience, let M = [^] . Let the distribution q be in force. We have 
set things up so that $ is bankrupted whenever any element in the (5-tail of 
q follows any sequence in Rpgh^, where A^<z<A^ + M — 1. To see this, 
note that 

F-\l-6)>fi5)= max $(X^). (4) 

N<i<N+\^~\ 

Equivalently, conditioned on any sequence in Rp^s'',i with i between N and 
N + M — 1, the scheme $ fails with probability (under q) at least 6 in step 
i + l. 

A sequence on which <I> has entered, but such that <I> has not been bankrupted 
on any of the sequence's prefixes is called a surviving sequence. 

Consider a surviving sequence G Rp,s'',N iri the support of p at level 
N. Given , let the conditional probability that is bankrupted in the 
following step be 5n. From @, as mentioned before, we have 5n > 

Now, given x'^ € Rp ^k ji^, the conditional probability that # is bankrupted 
in at most two further steps is, 

5N + il- Sn)Sn+i >S + il- 5)5, 

where (Jtv+i is interpreted as the weighted average (over surviving length- 
(A^ + 1) suffixes of x^) of the probability that $ goes bankrupt in step 
A^ + 2. 
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Similarly, given a sequence G Rp,5'',N^ probability that $ is bankrupted 
on suffixes of with length between N and A'^ + M is 

N+M-l 

Sn + {^- Sn)Sn+i + • • • + Sn+m Y\. '^») 

i=N 

for some 5^, ^jv+i, ■ ■ • ,Sn+m, ah of which are > 5. 

Let qi be the probability of all survivors in §k jy, and q2 be the prob- 
ability of all sequences in Rp^^k^j^ where $ has already been bankrupted. 
Therefore qi + q2 = 5(-Rp,5fe,Ar)- 

Let 6 stand for 1 — 5. Now $ is bankrupted with probability 

(N+M-l \ 
5n + ■■■ + Sn+m (1 - Si) 1 

i=N J 

= q2 + qi{Sj^ +(5^(5^+1 +(5^+,(. .. (5^+^-1 + (^jv+M-i^^iv+M)))) 
> q2+qi{5+ {1-6)6 + ... + {1- 6)^S) 

= q2 + qi{l-{l-6)^'^'^) 
>g(i?,,,.,^)(l-(l-5)rV^l) 



>(i-^-M« 



,j(^i_(i_5)ri/5i). 



The Theorem follows. □ 



5. Sufficient condition for insurability 

If no p G P is deceptive, there is for each p eP & number > such that, 
for every 5 > 0, over the set of probability distributions in V that are in the 
neighborhood 

{p' er:J{p',p) <ep}, 

there is a uniform bound on the (5-percentile. We pick such an for each 
p EV and call it the reach of p. For p EV, the set 

Bp = {p' eV:J{p,p')<ep}, 

where €p is the reach of p, will play the role of the set of probability distri- 
butions in V such that even if the true marginal loss distribution in force is 
one of these distributions, it will be okay to eventually set loss-dominants 
as if p were in force. 
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To prove that 7^°° is insurable if there are no deceptive distribution among 
its one dimensional marginals, we will need to find a way to cover the 
set V with countable many sets of the form Bp. Unfortunately, J [p., q) is 
not a metric, so it is not immediately clear how to go about doing this. On 
the other hand since J{jp\p) < |p — p'|i/ln2, where |p — denotes the £i 
distance between p and p', see Lemma [5] in the Appendix, we can bootstrap 
off an understanding of the topology induced on V by the ii topology on 
the set of all probability distributions on N. 

5. 1 . Topology of 'P with the i\ metric 

The topology induced on V by the ii metric is Lindelof, i.e. any covering of 
V with open sets in the ii topology has a countable subcover, see [3 Defn. 
6.4] for this definition of a Lindelof topological space. 

We can show that V with the £i topology is Lindelof by appealing to the 
fact that the set of all probability distributions on N, with the £i topology, is 
second countable, i.e. that it has a countable basis, which is a consequence of 
its have a countable norm-dense set (consider the set of all probability dis- 
tributions on N with finite support and with all probablities being rational) . 
Now, "P, as a topological subspace of a second countable topological space 
is also second countable O Theorem 6.2(2)], and every second countable 
topological space is Lindelof [5l Thm. 6.3]. 

5.2. Sufficient condition 

We now have the machinery required to prove that if no p € P is deceptive, 
then is insurable, which is the other direction of Theorem [H as stated 
next. 

Theorem 3. If no p G T' is deceptive, then is insurable. 
Proof The proof is constructive. For any < ?] < 1, we obtain a loss 
domination scheme $ such that for all p S 7^°°, p{^ goes bankrupt ) < rj. 
For p G let 



where is the reach of p. The set Qp is non-empty when ep > 0. For large 
n, loss sequences of length n with empirical distribution in Qp will play the 
role of those loss sequences on which the loss domination scheme $ to be 
proposed will have entered; this will ensure that $ enters with probability 
1 when p is in force. Note that if Cp is small enough then Qp OV C Bp. 
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Since no p G "P is deceptive, none of the sets Qp are empty and the space 
V of distributions can be covered by the sets QpCiV, namely 

7' = Up6p(QpnP). 

From Section 15. H we know that V is Lindelof under the ii topology. Thus, 
there is a countable set V C V, such that V is covered by the collection of 
relatively open sets Qp, where 

Qp -{Qpnv-.pef}. 

We index the countable set V (or Qp) by t : P — > N. 

We now describe the loss domination scheme $ having the property that 
for all p G 1^°°, goes bankrupt ) < r]. 

Preliminaries Consider a length-n sequence on which $ has not en- 
tered thus far. Let the empirical distribution of the sequence be q, and let 

r'g:= {p' GV:qeQp>} 

be the set of distributions in V (more precisely, V) which can potentially 
capture q. Note that q in general need not belong to V. 

If 7^ 0, we will refine the set of distributions that could capture q 
further to Vq C Vg. This is to ensure that models in V'g do not prematurely 
capture loss sequences. 

Let p be the model in force. The idea is that we want sequences generated 
by p to be captured by models that have p in their reach. We will require ^ 
below to hold to ensure that if any other distribution p' S Vq which does not 
have p in its reach captures sequences with empirical distribution q, then 
the probability of such sequences is not too large under p. In addition, we 
impose ([6]) as well to resolve a technical issue since q need not, in general, 
belong to V. 

For p' £ V'g, let the reach of p' be e^, , and define 

■ 256 

This quantity will lower bound the distance of the empirical distribution q 
in question from the distribution p in force if p happens to be out of the 
reach of p'. Specifically, we place p' E Vg, if n satisfies 
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and 




D,/3) <logC(p') 



(6) 



where C{p') is 



C{p') := 2 




Note that C{p') is finite since p' is not deceptive. Comparison with Lemma [7] 
win give a hint as to why the equations above look the way they do. 

Description of $ For the sequence with type g, if Vq = 0, the scheme 
does not enter yet. \iVq ^ let pq denote the distribution in Vq with the 
smallest index. 

All sequences with prefix x" (namely sequences obtained by concatenating 
with by any other sequence of symbols) are then said to be trapped by 
Pq — namely, loss-dominants will be based on pq. The loss-dominant assigned 
for a length-m sequence trapped by pq is 



with probability 1, no matter what distribution p S P is in force. Every 
distribution p G "P is contained in at least one of the sets in Qp. Let Q £ Qp 
be the set with the smallest index among all sets in Qp that contain p. There 
is thus some 7 > such that the neighborhood around p given by 



satisfies I{p,j) ^ Q. Let p' be the distribution of V which defines the set Q 
in Qp. Note in particular that p is in the reach oi p' . 

With probability 1, the empirical distribution of sequences generated by 
p lies within /(p, 7) [6] (see also Lemma[7]for an alternate proof). Now (j5]) 
will hold for all empirical distributions that fall in I{p, 7), if we make n large 
enough — since C{p') and l{p') do not change with n and the right hand side 
diminishes to zero polynomially with n, while the left hand side diminishes 
exponentially to zero. Lastly, ([6]) will also hold almost surely, for q is the 



empirical probability of sequences generated by p, then ^(1 — a/ -^p'/^) — ^ 



F-^{1- D^,/5) with probability 1, and Fp^{l- D^,/3) < log C{p') since 




$ enters with probability 1 First, we verify that the scheme enters 



^(p,7) := {q ■■ \p-q\i < 7} 
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Probability of bankruptcy < r/ We now analyze the scheme. Consider 
any p £ V. Among sequences on which has entered, we will distinguish 
between those that are in good traps and those in bad traps. If a sequence x" 
is trapped by p' such that p G Bp/, p' is a good trap. Conversely, if p ^ Bp/, 
p' is a bad trap. 

(Good traps) Suppose a length-n sequence is in a good trap, namely, 
it is trapped by a distribution p' such that p G Bpi. Recall that the loss- 
dominant assigned is 



where the inequality follows because p' is not deceptive, and p is within the 
reach of p' . Therefore, the scheme is bankrupted with probability at most 
6' = 6r]/2Tr'^n? in the next step. Therefore, summing over all n, sequences in 
good traps contribute at most 77/2 to the probability of bankruptcy. 

(Bad traps) We will show that the probability with which sequences 
generated by p fall into bad traps < ij/2. Pessimistically, the conditional 
probability of bankruptcy in the very next step given a sequence falls into 
a bad trap is going to be upper bounded by 1. Thus the contribution to 
bankruptcy by sequences in bad traps is at most rj/2. 

Let q be any length-n empirical distribution trapped by p with reach e 
such that p ^ Bp, we obtain from Lemma [6] that J{p, q) > ^^f^- Hence, for 
all q trapped by p, 



Thus, for p G the probability length-n sequences with empirical 

distribution q is trapped by a bad p is, using ^ and ^ 




\p-q\l>j\p,q){ln2f > 



i^{ln2f 
256 



<pi^\q-p\^ > D. and ^-^(1 

(a) / nD-\ 

< (C(p)-2)exp -— 




j^)<\ogC{p) 



W ri{C{p) - 2) 36 
- 2C(p)i(p)2n2^ 



< ^ 36 
- 2i(p)2n2 7r4' 
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where the inequahty (a) follows from Lemma [7] and (6) from ([5]). Therefore, 
the probability of sequences falling into bad traps 

< V V ^ -<r7/2 

since Ep'eQ^ ^ En>i 7? = 4- The theorem follows. □ 
6. Concluding remarks 

The loss domination problem formulated and solved in this paper appears to 
be of natural interest. However, there are several features of the insurance 
problem formulated here that might appear troubling even to the casual 
reader. In practice an insured party entering into an insurance contract 
would expect some stability in the premiums that are expected to be paid. 
A natural direction for further research is therefore to study how the notion 
of insurability of a model class changes when one imposes restrictions on 
how much the premium set by the insurer can vary from period to period. 
Another obvious shortcoming of the formulation of the insurance problem 
studied here is the assumption that the insured will accept any contract 
issued by the insurer. Since the insured in our model represents an aggregate 
of individual insured parties, a natural direction to make the framework 
more realistic would be to think of the insured parties as being of different 
types. This would in effect make the total realized premium from the insured 
(the aggregate of the insured parties) and the distribution of the realized 
loss in each period a function of the size of the premium per insured party 
set by the insurer in that period. Characterizing which model classes are 
insurable when the realized premium and the realized loss are functions of 
a set premium per insured party would be of considerable interest. 

Both for the loss domination problem and for the insurance problem, 
working with model classes for the loss sequence that allow for dependen- 
cies in the loss from period to period, for instance Markovian dependen- 
cies, would be another interesting direction for further research. Considering 
models with multiple, possibly competing insurers, as well as considering an 
insurer operating in multiple markets, where losses in one market can be 
offset by gains in another, also seem to be useful directions to investigate. 
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Appendix 

Lemma 4. Let p and q be probability distributions on N with support 
^ C N. Suppose q) < e. For any 5 C ^ and a < 1 - In 2 = .30685, if 
p{S) > 1 — a, then 

q{S) > l-2e-2/i(a). 

Proof Let ps (respectively qs) denote a binary distribution, the two prob- 
abilities of which correspond to \p{S), l—p{S)\ (respectively [q{S), l—q{S)\). 
Now, 

The last inequality follows because the condition p{S) > 1 — a > ^ implies 
h{ps) < ^(1 — a). Therefore, 

log ? <M±£ 

^p{S) + q{S) - (1-a) ' 

implying that 

l + g(g) . p{S)+q{S) !^ 

> 1 - {h{a) + e), 

where the last inequality follows since In 2 < 1 — a. □ 
Lemma 5. Let p and q be probability distributions on N. Then 

\p- q\i < J{p,q) < r~^\p- q\i ■ 



41n2'" -^11 - - v-'-^/ - 1^2' 
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If, in addition, r is a probability distribution on N, then 

J{p,q) + J{q,r)>j''{p,r)^-^. 
Proof The lower bound in the first statement follows since 

^i^ll^J^2h^4"^-^l^ 

and similarly for The upper bound in the first statement follows 

since 

' p{x) — q[x) ~ 



X 

<\p- q\i- 



p{x) + q{x 

q{x) — p{x) 
p{x) + q{x) 



To prove the triangle-like inequality, note that 

Jip, q) + J{q, > 4^ (b - q\l + \q- r\l) 

> -^i\p-r\if 
-81n2^'^ '^^ 

In 2 ,2 
>^J{p,rf, 

where the last inequality follows from the upper bound on r) already 
proved. □ 

Lemma 6. Let eo > 0. If 

\po-q\i<^\^, 
then for all p G P with J^{p,po) > eo, we have 

J{p,q)>^. 

Proof Since 

\po-q\i < -^^-j^ — , 
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Lemma [5] implies that 

Fm'ther, Lemma [5] then imphes that 

.^4\n2 , ^ J^(p,po)ln2 ^ egln2 

Jip^q) + > J[p,q) + J[po,Q) > ^ > — ' 

where the last inequality follows since J{p-,Po) > eo- □ 

Lemma 7. Let p be any probability distribution on N. Let (5 > and let 
/c > 2 be an integer. Let X"^ be a sequence generated i.i.d. with marginals p 
and let be the empirical distribution of X^. Then 

-p\>5 and - 5/2,) < k) 

<(2^-2)exp(-^^ 

Proof There is a similar lemma in [8j. The difference from [8j is that the 
right side of the inequality above does not depend on p, and this property 
is crucial for its use here. 

The starting point is the following result. Suppose p' is a probability 
distribution on N with finite support of size L. Then from [9], if we consider 
length n sequences, 

p'MX^) -p'\i <5)>l-{2^- 2) exp f-^Y (7) 



p'{i) 



2 

Since k >2, consider the distributions and q' with support A = {1, . . . ,k — 1}U 
{— 1}, obtained as 

p{i) i < k 

^ET=kPU) i = -i' 
and similary for q'. 
From O, 

Pi\p' - q'\i > V3) < (2' - 2) exp . 

We will show that if ^-^(l - 6/3) < k and \p - q\i > 5 then q'{-l) < 6/3 
and \p' — q'\i > 6/3. Thus, we will have 

-p\i>6 and F-\l - 6/3) < k) 

< p'{\p - q'\i > 5/3 and q{-l) < 6/3) 

<(2'=-2)expf- — 
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Finally, as in [8], 

k-l 

1=1 

oo oo 

<E(^'(-?')-9(i)) + 2^9(j) 

j=k j=k 

<\p'{-l)-q'{-l)\+26/3. 

Since p{l) = p'{l) and q{l) = q'{l) for all / = 1, ... ,A; — 1, we have \p' — q'\i > 
\p — q\i — 26/3. If \p — q\i > S in addition, \p' — q'\i > 6/3. □ 

References 

[1] K. Englund and A. Martin-Lof. Statisticians of the Centuries, chapter 
Ernst Filip Oskar Lundberg, pages 308-311. New York: Springer, 2001. 

[2] H. Cramer. Historical Review of Filip Lundberg's Work on Risk The- 
ory. Skandinavisk Aktuarietidskrift (SuppL), 52:6-12, 1969. Reprinted in 
The Collected Works of Harald Cramer edited by Anders Martin-Lof, 2 
volumes Springer 1994. 

[3] S. Asmussen and H. Albrecher. Ruin probabilities. World Scientific Pub- 
lishing Company, 2nd edition, 2010. 

[4] N. Santhanam and V. Anantharam. Agnostic insurance tasks and their 
relation to compression. In International conference on signal processing 
and communications (SPCOM), 2012. 

[5] J. Dugundji. Topology. AUyn and Bacon Inc., Boston, 1970. 

[6] K.L. Chung. A note on the ergodic theorem of information theory. An- 
nals of Mathematical Statistics, 32:612 — 614, 1961. 

[7] N. Santhanam and V. Anantharam. Prediction over countable alphabets. 
In Conference on Information Sciences and Systems, 2012. 

[8] S. Ho and R. Yeung. On information divergence measures and joint 
typicality. IEEE Transactions on Information Theory, 56(12) :58935905, 
2010. 

[9] T. Weissman, E. Ordentlich, G. Seroussi, S. Verdu, and M. Weinberger. 
Universal discrete denoising: known channel. IEEE Transactions on In- 
formation Theory, 51(l):5-28, 2005. See also HP Labs Tech Report 
HPL-2003-29, Feb 2003. 



imsart-generic ver. 2007/12/10 file: arxiv.tex date: January 31, 2013 



